This paper introduces a novel method of adding intrinsic bonuses to task-oriented reward function in order to efficiently facilitate reinforcement learning search. While various bonuses have been designed to date, they are analogous to the depth-first and breadth-first search algorithms in graph theory. This paper, therefore, first designs two bonuses for each of them. Then, a heuristic gain scheduling is applied to the designed bonuses, inspired by the iterative deepening search, which is known to inherit the advantages of the two search algorithms. The proposed method is expected to allow agent to efficiently reach the best solution in deeper states by gradually exploring unknown states. In three locomotion tasks with dense rewards and three simple tasks with sparse rewards, it is shown that the two types of bonuses contribute to the performance improvement of the different tasks complementarily. In addition, by combining them with the proposed gain scheduling, all tasks can be accomplished with high performance.
translated by 谷歌翻译
Sampling-based model predictive control (MPC) can be applied to versatile robotic systems. However, the real-time control with it is a big challenge due to its unstable updates and poor convergence. This paper tackles this challenge with a novel derivation from reverse Kullback-Leibler divergence, which has a mode-seeking behavior and is likely to find one of the sub-optimal solutions early. With this derivation, a weighted maximum likelihood estimation with positive/negative weights is obtained, solving by mirror descent (MD) algorithm. While the negative weights eliminate unnecessary actions, that requires to develop a practical implementation that avoids the interference with positive/negative updates based on rejection sampling. In addition, although the convergence of MD can be accelerated with Nesterov's acceleration method, it is modified for the proposed MPC with a heuristic of a step size adaptive to the noise estimated in update amounts. In the real-time simulations, the proposed method can solve more tasks statistically than the conventional method and accomplish more complex tasks only with a CPU due to the improved acceleration. In addition, its applicability is also demonstrated in a variable impedance control of a force-driven mobile robot. https://youtu.be/D8bFMzct1XM
translated by 谷歌翻译
从高维观测数据中提取低维潜在空间对于在提取的潜在空间上构建具有世界模型的实时机器人控制器至关重要。但是,没有建立的方法可以自动调整潜在空间的尺寸,因为它发现了必要和充分的尺寸大小,即世界模型的最小实现。在这项研究中,我们分析并改善了基于Tsallis的变异自动编码器(Q-VAE),并揭示,在适当的配置下,它始终有助于使潜在空间稀疏。即使与最小的实现相比,预先指定的潜在空间的尺寸是多余的,这种稀疏也会崩溃不必要的尺寸,从而易于删除它们。我们通过提出的方法在实验中验证了稀疏性的好处,它可以轻松地使用需要六维状态空间的移动操纵器找到必要和足够的六个维度。此外,通过在提取的维度中学习的最低实现世界模型的计划,该提出的方法能够实时发挥最佳的动作序列,从而将达到的成就时间降低了约20%。随附的视频已上传到YouTube:https://youtu.be/-qjitrnxars上
translated by 谷歌翻译
With deep learning applications becoming more practical, practitioners are inevitably faced with datasets corrupted by a variety of noise such as measurement errors, mislabeling and estimated surrogate inputs/outputs, which can have negative impacts on the optimization results. As a safety net, it is natural to improve the robustness to noise of the optimization algorithm which updates the network parameters in the final process of learning. Previous works revealed that the first momentum used in Adam-like stochastic gradient descent optimizers can be modified based on the Student's t-distribution to produce updates robust to noise. In this paper, we propose AdaTerm which derives not only the first momentum but also all of the involved statistics based on the Student's t-distribution, providing for the first time a unified treatment of the optimization process under the t-distribution statistical model. When the computed gradients statistically appear to be aberrant, AdaTerm excludes them from the update and reinforce its robustness for subsequent updates; otherwise, it normally updates the network parameters and relaxes its robustness for the following updates. With this noise-adaptive behavior, AdaTerm's excellent learning performance was confirmed via typical optimization problems with several cases where the noise ratio is different and/or unknown. In addition, we proved a new general trick for deriving a theoretical regret bound without AMSGrad.
translated by 谷歌翻译
自动驾驶已经取得了很大进展,并在实际使用一步一步上引入。另一方面,个人移动性的概念也受欢迎,它专门为各个驱动程序的自主驾驶是一个新的步骤。然而,难以收集大型驾驶数据集,这基本上需要自主驾驶的学习,从个人移动性的各个驾驶员。此外,当驾驶员不熟悉个人移动性的操作时,数据集将包含非最佳数据。因此,本研究专注于为个人移动性的自主驱动方法,具有如此小而嘈杂,所谓的个人数据集。具体而言,我们基于TSAllis统计数据引入了一个新的损失函数,即权重梯度根据原始损耗功能,并允许我们在优化阶段排除嘈杂的数据。此外,我们改进了可视化技术,以验证驾驶员和控制器是否具有相同的感兴趣区域。从实验结果来看,我们发现传统的自主行驶由于个人数据集中的错误操作而无法正常驱动,并且感兴趣的区域与驾驶员的行为不同。相比之下,所提出的方法稳健地学习违反错误,并在将相似的区域注入到驱动程序时自动启动。附加视频也上传了YouTube:https://youtu.be/keq8-boxyqa
translated by 谷歌翻译
To control humanoid robots, the reference pose of end effector(s) is planned in task space, then mapped into the reference joints by IK. By viewing that problem as approximate quadratic programming (QP), recent QP solvers can be applied to solve it precisely, but iterative numerical IK solvers based on Jacobian are still in high demand due to their low computational cost. However, the conventional Jacobian-based IK usually clamps the obtained joints during iteration according to the constraints in practice, causing numerical instability due to non-smoothed objective function. To alleviate the clamping problem, this study explicitly considers the joint constraints, especially the box constraints in this paper, inside the new IK solver. Specifically, instead of clamping, a mirror descent (MD) method with box-constrained real joint space and no-constrained mirror space is integrated with the Jacobian-based IK, so-called MD-IK. In addition, to escape local optima nearly on the boundaries of constraints, a heuristic technique, called $\epsilon$-clamping, is implemented as margin in software level. Finally, to increase convergence speed, the acceleration method for MD is integrated assuming continuity of solutions at each time. As a result, the accelerated MD-IK achieved more stable and enough fast tracking performance compared to the conventional IK solvers. The low computational cost of the proposed method mitigated the time delay until the solution is obtained in real-time humanoid gait control, achieving a more stable gait.
translated by 谷歌翻译
深度加强学习(DRL)是教授机器人执行复杂任务的有希望的方法。因为直接重用所存储的体验数据的方法无法遵循与时变环境中的机器人问题的环境的变化,所需的在线DRL。资格迹线方法是一种用于提高传统增强学习中的样本效率的在线学习技术,而不是线性回归而不是DRL。深度神经网络参数之间的依赖性会破坏资格迹线,这就是它们不与DRL集成的原因。虽然用最具影响力的梯度替换渐变而不是累积梯度,但随着资格迹线可以缓解这个问题,替换操作会减少先前体验的重用率。为了解决这些问题,本研究提出了一种新的资格迹线方法,即使在DRL中也可以使用,同时保持高样本效率。当累积梯度与使用最新参数计算的梯度不同时,所提出的方法考虑了过去和最新参数之间的发散,以便自适应地衰减资格迹线。由于过去和最新参数之间的发散不可行的计算成本,利用了过去和最新参数的输出之间的Bregman分歧。另外,第一次设计具有多个时间尺度迹线的广义方法。这种设计允许更换最有影响力的自适应积累(衰减)的资格痕迹。
translated by 谷歌翻译
Automated driving technology has gained a lot of momentum in the last few years. For the exploration field, navigation is the important key for autonomous operation. In difficult scenarios such as snowy environment, the road is covered with snow and road detection is impossible in this situation using only basic techniques. This paper introduces detection of snowy road in forest environment using RGB camera. The method combines noise filtering technique with morphological operation to classify the image component. By using the assumption that all road is covered by snow and the snow part is defined as road area. From the perspective image of road, the vanishing point of road is one of factor to scope the region of road. This vanishing point is found with fitting triangle technique. The performance of algorithm is evaluated by two error value: False Negative Rate and False Positive Rate. The error shows that the method has high efficiency for detect road with straight road but low performance for curved road. This road region will be applied with depth information from camera to detect for obstacle in the future work.
translated by 谷歌翻译
Spatio-temporal modeling as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the underlying heterogeneity and non-stationarity implied in the graph streams, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a large-scale spatio-temporal dataset that contains a variaty of non-stationary phenomena. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle locations and time slots with different patterns and be robustly adaptive to different anomalous situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译
Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity and non-stationarity implied in the traffic stream, in this study, we propose Spatio-Temporal Meta-Graph Learning as a novel Graph Structure Learning mechanism on spatio-temporal data. Specifically, we implement this idea into Meta-Graph Convolutional Recurrent Network (MegaCRN) by plugging the Meta-Graph Learner powered by a Meta-Node Bank into GCRN encoder-decoder. We conduct a comprehensive evaluation on two benchmark datasets (METR-LA and PEMS-BAY) and a new large-scale traffic speed dataset in which traffic incident information is contained. Our model outperformed the state-of-the-arts to a large degree on all three datasets (over 27% MAE and 34% RMSE). Besides, through a series of qualitative evaluations, we demonstrate that our model can explicitly disentangle the road links and time slots with different patterns and be robustly adaptive to any anomalous traffic situations. Codes and datasets are available at https://github.com/deepkashiwa20/MegaCRN.
translated by 谷歌翻译